Maximum likelihood reconstruction for Ising models with asynchronous updates 
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We describe how the couplings in a non-equilibrium Ising model can be inferred from observing 
the model history. Two cases of an asynchronous update scheme are considered: one in which we 
know both the spin history and the update times (times at which an attempt was made to flip a 
spin) and one in which we only know the spin history (i.e., the times at which spins were actually 
flipped). In both cases, maximizing the likelihood of the data leads to exact learning rules for the 
couplings in the model. For the first case, we show that one can average over all possible choices 
of update times to obtain a learning rule that depends only on spin correlations and not on the 
specific spin history. For the second case, the same rule can be derived within a further decoupling 
approximation. We study all methods numerically for fully asymmetric Sherrington-Kirkpatrick 
models, varying the data length, system size, temperature, and external field. Good convergence is 
observed in accordance with the theoretical expectations. 
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Inferring the interactions between the elements of a 
network from their observed states is a challenging and 
important problem. It lies at the heart of many practical 
applications in data analysis, in particular for analyz- 
ing high-throughput measurements made from biological 
networks [l[ . Network reconstruction can be posed as an 
inverse problem in statistical physics, and various models 
have been studied from this perspective to gain insight 
into the theoretical aspects [2|-|4[ and important applica- 
tions 043 ■ Though the initial work was framed in terms 
of equilibrium models 0, 0] , recent attention has focused 
on inferring connectivity in non-equilibrium system be- 
cause of the wider generality and relevance to systems 
where one has data on the system over time jl0l - ll3| . 
The general problem in these studies can be described 
as follows. One is given a set of stochastic variables rep- 
resenting e.g. spin configurations, gene expression levels, 
neuronal activity, that evolve according to a given set 
of kinetic equations. One is then asked to find the pa- 
rameters of these equations, in particular the couplings 
between the observed variables, given a sample of their 
history. 

An attractive platform for stud ying this inverse prob- 
lem is the Ising model @, H, S Elf OS]. For the 
dynamical version of this model, one can consider cither 
synchronous or asynchronous updates. The synchronous 
case has been recently treated in [ll|, while the asyn- 
chronous case was treated in 2J] using dynamic mean- 
field approximations. For a symmetric coupling matrix, 
the asynchronous model will equilibrate to a Gibbs dis- 
tribution with an Ising model Hamiltonian, and the cou- 
plings in this case can be found by Boltzmann Learning 
251 ] . However, no exact likelihood-based learning algo- 



rithm for the asynchronous model has been derived when 
the coupling matrix does not have this special symme- 
try. In this paper we solve this general problem, using 
maximum-likelihood inference. Studying the kinetic ver- 
sion of this model is important for making the connection 
between the significant work on learning done for equilib- 
rium models and that done for non-equilibrium ones. Al- 
though here we focus on the Glauber kinetic Ising model, 
the principles and implications of our results are rather 
general. 

Kinetic Ising model with asynchronous up- 
dates. Consider N binary spins, Sj = 1 or — 1, i = 
1 • • ■ N, coupled to each other through a matrix J^- and 
each subject to an external field 6i. As indicated above, 
the coupling matrix need not be symmetric and, conse- 
quently, the system may not possess a Gibbs equilibrium 



state [26j] . One can describe this stochastic dynamical 
system in either of two ways: 

(1) Consider a time discretization with steps of size 
St. At each time step, update every spin with proba- 
bility j St, where 7 is a constant with dimension of in- 
verse time. We will assume 7 to be known a priori, not 
a parameter of the model to be determined. By "up- 
date" we mean assign it a new value Si(t + St) with 
probability (1 + Si(t + 1) tanhH^t)) /2 = cxp(si(t + 
St)Hi(t)) /2 cosh Hi(t), where H % (t) = 9 t + 'Z j J ji s j{ t ) 
is the total field acting on spin i at time t. Of course, 
it is possible that the new value, s,(i + St) is equal to 
the old one; updating a spin does not necessarily mean 
flipping it. Multiple spins can be updated in one time 
step, but if St is small enough in most time steps at most 
one spin is updated. The synchronously-updated model 
is the case j5t = 1. Thus, one can interpolate between 
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the synchronous and asynchronous models by varying 7. 
In this formulation, the model is doubly stochastic: the 
dynamics of one set of random variables (the spins) are 
conditional on the dynamics of the other (the updates). 
In this paper we set the temperature that conventionally 
appears in descriptions of this model equal to 1, because 
it can be absorbed into the definitions of the fields and 
couplings. Equivalently, our fields and couplings are in 
units of the temperature. 

(2) Starting from the Glauber master equation (27j : 
Then at every step every spin is flipped with a probability 
7<5t (1 — Sj(t) tanhiTj(i)) /2. As in scheme (1), multiple 
spins can flip in a single time step, but this only happens 
with probability of order (St) 2 . Thus, for small enough 
St (which is the limit we consider), in most time intervals 
at most one spin is flipped. 

The difference between the schemes is that the first has 
two sets of random variables, the update times (which we 
denote by {n}) and the spin histories {si(t)} : while the 
second contains only the {si(t)}. It is simple to show 
that marginalizing out the {r,} in the first scheme leads 
exactly to the second one. Thus, all averages over his- 
tories involving spins only (i.e., not involving the up- 
date times) will be exactly the same in the two schemes. 
Nevertheless, knowing "the history of the system" (i.e., a 
realization of its stochastic evolution) means something 
different in the two schemes. In the first we know all the 
update times, while in the second we only know those at 
which the updated spins flipped. We will see below that 
knowing these extra data influences how well we can infer 
the couplings from a data record of a given length. Which 
scheme is relevant for inferring the couplings from data 
depends on the specific nature of the system being model- 
ed and the data available. The "update times" may be 
meaningful and, if so, available in some cases and not 
in others. An example of a doubly stochastic system 
where this distinction can be relevant is a securities mar- 
ket 28, 29j]. Traders in such a market place limit orders: 
conditional offers to buy securities if their market price 
falls below a specified threshold, or to sell if the market 
price rises above a threshold. Other traders may then 
respond or choose not to respond to these offers; if they 
do, transactions take place. The limit offers are like the 
updates in the Ising model, and the actual transactions 
are like the spin flips. 

Two likelihoods to maximize. Consider the first of 
the above schemes. We suppose we are given a history 
of the system, i.e., the data s = {si(t)} and r = {it}, of 
length L = T/St steps, and we are asked to reconstruct 
the couplings and external fields. We do this by max- 
imizing the likelihood P{s,t) = P(s\t)p(t) over these 
parameters. For each spin i, the Tj are a (discretized) 
Poisson process, i.e., every t has probability jSt of being 
a member of the set r. Thus the probability of the up- 
date history, p(r), is independent of the model parame- 
ters, and we can take as the objective function log P(s\t), 



Ci =5^5^[« i (T i + 5t)fl<(T i )-log2coshfl i (r i )]. (1) 

i Ti 

This is just like the synchronous-update case except that 
the sum over times is only over the update times. It leads 
to a learning rule 

SJ l3 oc — i = y\si{Ti + St) - tanh(ifi(ri))] Si (Ti). (2) 

aJ ij T . 

This equation includes the learning rule for 6i under the 
convention Jio = 6i, so(t) = 1. We call this algorithm 
"spin- and update- history-based" , or "SUH" for short. 

In the other dynamical scheme, we know only the spin 
history, not that of the updates. Since this scheme is 
equivalent to the first one with the marginalized out, 
we treat it by maximizing P(s) = J2 T P(S\t)p(t) picj ]. 
This leads to an objective function 



i,t 



log 



(1 - -ySt)S s .( t+S t), Si (t) +7<**- 



2 cosh Hi(t) 



(3) 

Separating terms with and without spin flips, we can 
write the resulting learning rules in the form 



5Jij 



= ^2[ s i(t + St) - tanh(iZi(t))]sj-(t) 

flips 



i(t + St)[l - taah 2 (Hi(t))] Sj {t), (4) 



no flips 



again including the rule for the external fields with the 
convention J;o = So(t) — 1. We call this the "spin- 
history-only" ("SHO") algorithm. 

Reconstruction errors for both algorithms can be calcu- 
lated by analyzing the Fisher information matrices. For 
SHO the elements of the Fisher matrix read 



d 2 C 2 



dJijdJu 



flips 



tanh 2 ( J ff i (t))]s J (t)s z (t) 



+ 2S ik ^St ^2 Si(t + St)taxHx(Hi(t)) 

no flips 

x [1 -tanh 2 (ffi(t))]sj(t)si(i). (5) 

In the weak coupling limit, this matrix has nonzero ele- 
ments only for j = /, and the mean value of these non- 
zero elements yields the inverse of the mean square error 
(MSE). In the absence of external fields, the second term 
in Eq. ([5]) vanishes; thus, the mean square error in this 
case is 2/(LjSt), noting that the probability that a time 
step is a flip is jSt/2. 

For SUH the calculation is analogous. In the absence 
of external fields one can show that in the weak-coupling 
limit the mean square reconstruction error is (T^f)^ 1 = 
(L-fSt) -1 , i.e., a factor of two smaller than for SHO. 
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History-averaged learning. Both these learning 
rules utilize explicitly their respective full model histo- 
ries, both {Si(t)} and n for SUH and {s,{t)} for SHO. 
In this section we derive a third rule by averaging the 
one for SUH over all update histories. Consider first the 
quantity 

d{ Si {t) Sj (to)) = {sj(t + St) Sj (t )) - ( Si (t) Sj (t )) 
dt 5t->o St 

(6) 

where (• ■ ■) means an average over all realizations of the 
stochastic dynamics. Separating time steps into those at 
which an update occurred and those at which no update 
occurred, we have 

d(Si(Ti)Sj(t )} 



dt 

lim < ySt 



[(siin +5t)sj(t )) - (si(Tj)sjfto))] 
St 



(7) 



There is no contribution from steps with no flip because 
then Si (t + St) = Sj (t) and the numerator would be ex- 
actly zero. Thus we have expressed the average over all 
realizations of the first term in the learning rule in 
terms of a spin correlation function and its time deriva- 
tive: 



(SiiTi+S^Sj^)) 



1 ( d ( Sl (t)s 3 (t )) 
dt 



+ (s i {t)s j (t Q )). 

-„ =* 

(8) 

In averaging the second term in ([2]), the average over 
update times can safely be replaced by an average over 
all times, since the quantity tanh Hi(t)sj (t) is insensitive 
to whether an update is being made. Thus, averaging 
Eq. ([2]) over all possible histories yields 

SJtj oc 7 -1 Cy(0) + C,y(0) - (tanh(ffi(i)) Sj -(»). (9) 

where CV/(i) = (si(t + t)sj(t )). We will refer to the 
update rule given by © as the averaged-SUH rule, or 
"AVE" for short. This rule has the same structure as 
the one for the synchronous- update model [ll|, with 
(si(t + l)sj(t)) replaced by C(0) + j~ 1 C(0) and was 
stated in [24( , however there only in the context of match- 
ing correlation functions. AVE learning requires know- 
ing the equal-time correlation functions, their derivatives 
at t = 0, and (tanh(Hi(t))sj(t)). This latter quantity 
depends on the model parameters (through Hi(t)), so, 
in practice, estimating it at each learning step requires 
knowing the entire spin history. Hence, it requires ex- 
actly the same data as SHO learning. 

Can we derive an algorithm like (|9|) from SHO learn- 
ing by averaging over spin flip times in the same way 
we have done here by averaging SUH learning over up- 
date times? Let us denote the local fields at time t 
generated by the true model (the one that generated 



the data) by Hi(t), and, as before, the local field cal- 
culated using the inferred parameters as Hi(t). At each 
time step t, then, the probability of flipping spin i is 
jSt[l — s(t) tanh.ffj(t)]/2. We thus have to allot the first 
term in @ a weight ^St[l — s(t) tanhi?j(£)]/2 and the 
second a weight 1 — jSt[l — s(t) tanh Hi(t)}/2 « 1 (to 
first order in St for the whole expression). This yields 



dCx 



y / o 



2T 



eft [tanh - tanh Hi(t)] 
i{t) tanh Hi(t)} Sj (t). (10) 



The learning thus converges when the discrepancy 
tanh(if(t)) — tanh(.H(t)) is zero. Noting also that, by 
the arguments above leading to ((8]), the time average 

{tanhH{t) Sj (t)) t = 7 - 1 C'(0)/di + C*(0), (11) 

so we can write (flU)) as 

SJij oc 7 -1 aj(0) + Cy(0) - (tanh^(i) Sj -(i)) t 
+ ([tanh (i) - tanh tanh J H l (i)s J (<)) t (12) 

The first line is identical to the learning rule (|9]) that we 
derived above. We can obtain a learning rule heuristically 
by an ad hoc factorization of the average in the second 
line: 

([tanh-Hi(t) - tanh H t {t)]s t {t) tanh Hi(t)sj(t)) t « 
( tanh H l (t) - tanh Hi (t)sj (t)) t (si (t) tanh Hi (t)) t (13) 

This leads to 

SJij oc [7 _1 C« (0) + Cij(0) - (tanhffi^s^t))*] 

x ([l + s 4 (t)tanhl^)]) t . (14) 

This just amounts to varying the learning rate in (J9j> pro- 
portional to the time-averaged probability of not flipping 
according to the model, which is 1 plus corrections of 
order St. Thus we arrive by a different route at the AVE 
rule, ©• 

Next, we compare the performance of the three infer- 
ence algorithms: SUH, SHO, and AVE. We also compare 
their performances with those of the naive mean-field 
(nMF) and Thoulcss- Anderson-Palmer (TAP) approxi- 
mations to AVE investigated in j24[. We do this for 
fully asymmetric Sherrington-Kirkpatrick models 3 if : 
the couplings are zero-mean i.i.d. normal variables with 
variance [Jy] a ve = g 2 /N (and Jij is independent of Jji). 
Equivalently, we can think of the Jtj as having unit vari- 
ance and g as an inverse temperature. We study these 
at various values of g and external field 9. There are 
two other important parameters that influence the per- 
formance of the algorithms: the system size N and the 
data length L. As a performance measure, we use the 
MSE on the J y . 

Fig. [1] shows the performance of the algorithms. As 
anticipated above, the error for SUH is half of that for 
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SHO learning, as can be seen in Fig. [lja). The same 
panel shows, furthermore, that AVE and SHO appear to 
perform equally well for large enough data length. In ret- 
rospect, this is not surprising, since, as we noted, both 
algorithms effectively use the same data (the spin his- 
tory). For small data sets, the averaging that yields AVE 
from SHO may be prone to fluctuations yielding the two 
learning rules behaving differently. Fig. QJb) shows that 
the MSE for the exact algorithms is insensitive to N, 
while the approximate algorithms improve as N becomes 
larger (note however the opposite trend in Fig. [2(a)); in 
these calculations, the average numbers of updates and 
flips per spin were kept constant, taking L = 5 x 10 5 iV.) 
Fig. [He) shows that the performance of the three ex- 
act algorithms is also not sensitive at all to an external 
field 9, while nMF and TAP work noticeably less well 
with a non-zero 6. Finally, the effects of the inverse g 
or the temperature are depicted in Fig. Q](d). For fixed 
L, all the algorithms do worse at strong couplings (large 
g or low temperature). The nMF and TAP do so in a 
much more clear fashion at smaller g, growing approxi- 
mately exponentially with g for g greater than w 0.2. In 
the weak-coupling limit, all algorithms perform roughly 
similarly, except that SUH enjoys its factor-2 advantage 
(conferred by knowledge of the update times), as already 
seen in Fig. [Ha). 

In summary, we have addressed the problem of in- 
ferring the couplings in a non-equilibrium system: the 
asynchronous, fully- asymmetrically coupled kinetic Ising 
model. We showed how to derive three different learn- 
ing algorithms, utilizing three different levels of detail 
of the history of the system: the full spin and update 
history, the spin history only, and spin correlations at 
and near t = only. The three methods show perfor- 
mance that is promising in practical terms, agrees with 
theoretical expectations, and in particular is superior to 
approximate methods found earlier. Wc expect that the 
reasoning behind our techniquc(s) can be extended to a 
variety of inverse statistical mechanics problems beyond 
the particular case of the kinetic Ising model. 
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FIG. 1. (Color online) Mean square error (MSE) scaling with 
data length L, system size N, external field 9 and tempera- 
ture l/g, are shown in (a), (b), (c) and (d) respectively. In 
each figure, black square for nMF, red circle for TAP, blue up- 
per triangle for SHO, pink down triangle for AVE and green 
diamond for SUH respectively. The parameter values are gen- 
erally g=0.3, N=20, 9 = 0, L=10 7 except the variables in each 
subgraph. 
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